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Traditional metrics of node influence such as degree or betweenness identify highly influential 
nodes, but are rarely usefully accurate in quantifying the spreading power of nodes which are not. 
Such nodes are the vast majority of the network, and the most likely entry points for novel influences, 
be they pandemic disease or new ideas. Several recent works have suggested metrics based on path 
counting. The current work proposes instead using the expected number of infected-susceptible 
edges, and shows that this measure predicts spreading power in discrete time, continuous time, and 
competitive spreading processes simulated on large random networks and on real world networks. 
Applied to the Ugandan road network, it predicts that Ebola is unlikely to pose a pandemic threat. 

PACS numbers: 87.10.Mn, 87.19.X- 



Networks have become the premier approach to de- 
scribing spreading processes such as epidemics because 
they express the heterogeneity of interactions [Tj . Early 
metrics of node influence focused on identifying highly 
influential nodes from the macroscopic structure of the 
network such as degree [2] , k-shell [3] , or centrality [4] [5] . 
These measures, however, only rank the nodes without 
quantifying the outcome jB], and do not account for the 
dynamics of the spreading process [7J . 

Highly influential nodes are unlikely to be disease en- 
try points. They are, by definition, rare. Nor are they 
biological targets. More than half of all new or emerging 
infectious disease agents in humans are zoonotic in origin 
[51 [§] and thus closer to the periphery of society. Highly 
contagious diseases such as pandemic influenza circulate 
at low levels for months or years before epidemic breakout 
[lOj . Worryingly, structural measures of node centrality 
may considerably underestimate the spreading power of 
non-hub nodes [TT] . 

Only recently have measures been proposed which take 
into account the spreading process. To date, and to the 
best of our knowledge, these are limited to path count- 
ing approaches. Path counting was first proposed as the 
Accessibility metric, the exponential of the entropy of 
the number of paths [T21 [TB] . Path counting is also the 
basis of the impact |5J and the dynamic influence [7J, 
both of which include transition/transmission probabili- 
ties when calculating path length. A recent comparison 
of a number of measures, including accessibility, between- 
ness centrality, clustering, degree, and k-shell, found that 
the accessibility and the (weighted) degree were most pre- 
dictive of an individual node's spreading potential across 
a range of different network models (14] . Spreading pro- 
cesses, however, are not constrained to follow paths; they 
form clusters. 

We here show that measuring node spreading power 
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by taking the expectation of the degree of a disease clus- 
ter seeded from a single node accurately quantifies the 
spreading power of that node. Spreading power is as- 
sessed in three contexts. The simplest epidemic model is 
a susceptible-infected (SI) epidemic. Since infected nodes 
do not recover, such a process will in time infect every 
node connected to a seed node. When transmission is 
modelled in continuous time, node spreading power can 
be measured in terms of the expected time until half the 
network is covered. A model allowing for recovery, the 
susceptible-infected-susceptible (SIS), raises the possibil- 
ity that the outbreak dies out if nodes do not transmit 
before recovery. Node spreading power can here be mea- 
sured by estimating a node's probability of seeding an 
epidemic. Node spreading power can thirdly be assessed 
via a competitive spreading process in which two mutu- 
ally hostile infections invade a network. This problem has 
been attracting considerable attention in the alogirth- 
mics and social networks communities, where it is now 
known that determining the optimal starting point (s) is 
NP-hard [T5]. In scale- free networks, where growth in 
prevalence is near instantaneous [16j [17], with asymp- 
totic time to full coverage log(log(n)) [15] . victory goes 
to the team which is nearer to instantaneous. The rel- 
ative spreading power of the seed nodes determines the 
outcome. We conclude by assessing the spreading power 
of nodes formed by junctions in the Uganda road net- 
work, in the context of the recent Ebola outbreaks. 

Define the degree of a cluster of nodes as the number 
of edges connecting nodes within to nodes outside the 
cluster. Then the Expected Reach of node i, ERx(i), 
is the expectation of the degree of the infected cluster 
formed after X infections seeded from i. As the rate of 
disease spread is inversely proportional to the number 
of edges from infected to susceptible individuals, it is 
often preferable to work with the inverse of the ERx, 
the Expected Wait EWx ■ In a continuous time processes 
seeded from node i, the expected wait until infection X + 
1 given that X infections have occurred is EWx(i) — 
f3 / ERx{i), where j3 is the transmission probability along 
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FIG. 1. (Top) Histogram of ER 3 values from the 75% of 
nodes which are three or more hops distant from a hub, taken 
over 10 random networks. (Bottom) Histograms of ER3 val- 
ues from the four random networks. 

a single edge. The distribution of ER3 values shown in 
the histogram in Figure [T] 

The ERx(i) can be found by enumerating all possi- 
ble clusters of infected nodes which could occur after X 
infection events originating from i and taking the mean 
cluster degree of this set. For X = 0, the cluster is node 
i, and ERq is the degree of i. For X — 1, the clusters are 
the set of all pairs {i,j} where j is a neighbour of i, the 
degree of each cluster is deg(i,j) — deg(i) + deg(j) — 2, 
and ERi is the mean of deg(i,j) taken over all j which 
are neighbours of i. 

Results arc based on extensive simulations conducted 
both on random scale-free networks with 2 13 nodes and 
real-world networks with 2 14 to 2 21 nodes. The random 
networks are generated to have a Pareto (1, 2.3) degree 
distribution under the Chung Lu protocol [T5]. Real 
world networks include: the collaboration network from 
ArXiv Astrophysics [20], Enron emails [21], the Slashdot 
Zoo signed social network from Feb 21 2009 [2T], and 
Amazon co-purchases [55]; see Table 

TABLE I. Characteristics of the random and real networks in- 
cluding the number of nodes, largest eigenvalue a, and graph 
density. 





nodes 


Q 


density 


Random 


8,192 


12.2 


4.32 e-04 


Astrophysics [20] 


18,772 


94.4 


0.22 e-04 


Enron [21] 


36,692 


118.4 


2.73 e-04 


Slashdot HQ 


82,168 


124.7 


1.61 e-04 


Amazon [22] 


262,111 


5.3 


0.26 e-04 



The ER3 of all peripheral nodes is measured, with pe- 
ripheral nodes defined as those three or more hops distant 
from the closest hub, and hubs defined as nodes with de- 
gree greater than 60% of the maximum degree node of the 
network. Approximately 75% of the nodes in the random 
networks meet this criteria. The Slashdot and Amazon 
networks have over 2 16 nodes; in these cases only nodes 
greater than three hops from a hub are considered. In 
the random networks, ER3 is quantinized to EW3 by 
truncating the inverse to the next lower hundredth, i.e. 
ER 3 G (20,25] ->■ EW 3 = 0.04. This gives approxi- 
mately equal number of nodes for each value of EW3. 
In the real-world networks, the quantization is indepen- 
dently scaled give approximately uniform representation 
of the lower values of the resulting EW3. 

Expected reach is predictive of the mean time for an in- 
fection originating at a peripheral node to cover half the 
network in a disease without recovery simulated in con- 
tinuous time (SI model). Expected time to half coverage 
(tthc) is measured by simulating 2 7 epidemics for each 
seed node, measuring the time until half the nodes are 
infected, and fitting the measurements to an exponential 
distribution. Seed nodes are five randomly selected nodes 
at each observed value of EW3 on 2 5 random networks. 
For higher (and thus rarer) EW3 values, it is not always 
possible to select five nodes; in such cases all observed 
EW 3 values are used. The accessibility metric [T5] is 
also measured for each node. Logistic regression over the 
resulting 4563 observations shows that both EW3 and 
accessibility are highly predictive of tthc, with longer ex- 
pected wait associated with longer tthc. The EW3, how- 
ever, explains more of the deviance (71% vs 48%) and has 
a lower AIC (-12376 vs -9831). A similar procedure was 
performed on the real-world networks, again achieving 
significance in all cases as detailed in Table [TTJ 

The expected reach is indicative of a node's ability to 
seed a sustained epidemic in disease with unit time re- 
covery simulated in discrete time (SIS model). Epidemic 
potential is measured by simulating 100 epidemics and 
counting how many persist for 50 iterations [23] . Klemm 
et al. propose that for a discrete time process with unit re- 
covery, the critical transmission probability /3 value sep- 
arating the extinciton from endemic regime is the inverse 
of the largest eigenvalue a of the adjacency matrix [7]. 
This suggests that a is also a measure of a network's 
susceptibility to infection. Transmision probability is 
here set to /? = 5/a, placing the simulated epidemic well 
within the epidemic regime, with a determined indepen- 
dently for each random network. Five seed nodes are 
sampled for each observed EW3 value on 2 6 random net- 
works A generalize additive model of the resulting 8678 
observations shows that the combination of ER3 and j3 
(thus controlling for the network's susceptibility to epi- 
demic) explain 68% of the deviance in the probability 
that a node can start an epidemic. Due to the large num- 
ber of observations at low ER3 , the relationship is more 
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FIG. 2. (Color online) In an SIS model, disease which cannot 
rapidly establish itself in the population dies out. A node's 
EW3 is associated with its ability to seeding an epidemic. 
Disease transmissibility, /?, is set to five times the critical 
value separating the endemic/extinction regimes. 



clearly illustrated in terms of the expected wait (Figure 
[2]). The ER3 also explains a significant portion of the 
deviance in the real- world networks, as shown in Table 
pT| Setting p = 5/ a did not produce consistent results 
in the real world networks. No nodes had high epidemic 
potential at this level, with the exception of the Amazon 
network in which all did. To control for this variability, 
/3 is tuned such that nodes with high ER3 have epidemic 
potential approaching 100%. In the astrophysics collabo- 
ration network, this requires setting f3 = 14/ a, reducing 
the model fit from explaining 53% (at /3 = 5/a) to 47% 
of the deviance. 



TABLE II. Expected reach explains a significant percentage 
of the deviance in node time to half coverage (SI model) and 
probability of seeding an epidemic (SIS model) in real world 
networks. 

SI model SIS model 

dev explained p-value dev explained p-value 
Astrophysics 47% 1.10e-07 47% < 2~ 16 
Enron 38% 9.95e-06 15% 1.2 e-6 

Slashdot 32% 0.037 17% 1.24 e-4 

Amazon 65% 0.0005 80% < 2~ 16 



The expected reach predicts the outcome of a com- 
petitive spreading process simulated in continuous time. 
Here, we use a toy example of a zombie apocalypse met 
by concurrent spread of education in zombie hunting. 
Both zombies and hunters recruit from the susceptible 
population and mutually eliminate each other. For each 
network, a grid is formed for all possible pairings of ob- 
served EW 3 values. Ten epidemics are simulated at each 
point in the grid with one initial zombie and one initial 
hunter chosen randomly from nodes with EW3 values 
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FIG. 3. (Color online) The winning side in the competitive 
rumor spreading process is accurately predicted by EW3; the 
side with the shorter expected wait wins at least 50% of the 
time, with greater difference giving greater advantage. The 
heatmap shows how many trials out of ten are won by the 
humans, with points on the grid indicating the EW3 of the 
initial zombie and hunter node. Blue is good, red is bad. Up- 
per plot: random networks; Lower plot: real world networks. 



corresponding to the grid point value. The apocalypse is 
simulated in continuous time with the base rate of zomb- 
ification and of hunter training equal. The outcome mea- 
sure is the number of times (out of ten) the humans win, 
averaged at each point of the grid over 2 random net- 
works. A analogous process is applied to the real- world 
networks. In the random networks, the spreading process 
with the lower expected waiting time wins at least 50% of 
the time. The larger the difference in EW3, the greater 
the margin of victory. Results are similar, though less 
distinct, in the real world network (Figure [3]) 

Finally, we turn our attention to the recent Ebola out- 
break in Uganda. We make the simplifying assumption 
that the disease transmits between communities along 
the road network, and we regard this network as an un- 
weighted, undirected graph with nodes at each road junc- 
tion. Road data current as of 2009 is available from Hu- 
manitarian Response The 2012 epidemic originated 
in the village of Kigadi which sits at the juncture of three 
roads (Figure 13. The ER3 of this juncture is 5.1, im- 
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susceptible edges. 



FIG. 4. (Color online) The Uganda road network. Towns 
are blue dots. The red bullseye marks the village of Kigadi. 



plying that it has limited spreading power. In fact, the 
majority of peripheral nodes have ER3 < 6, suggesting 
that Elboa would have a hard time transitioning from 
a local to a national epidemic even with a less vigorous 
response from Medicine Sans Frontieres. 

While ER3 is sufficient for predictive purposes, the 
expectation is not unproblcmatic. The distribution over 
which the expectation is taken is strongly bimodal. Most 
( 80%) of the nodes analyzed here are three hops distant 
from a hub. The infection clusters which contain this 
hub have reach typically exceeding 300, while the clusters 
which do not have reach typically less than 100. 

Computational expense is a concern, but generally not 
a problem in practice. The number of clusters grows 
factorially in the degree of nodes reachable from i after 
X hops. This problem is not severe, however, as the 
measure is designed for nodes where this degree is small, 
and X = 3 is sufficient to determine the outcome, a result 
supported by the path counting literature 6,7. Running 
time for our non-optimized C++ code is comprable to 
that reported by Bauer et al. for counting all paths of 
length 4 under a SIR model [£]. 

Arguements explaining that path counting is categor- 
ically different from degree or centrality based measures 
[T2] also apply to the expected cluster degree. In net- 
works with heterogeneous degree distributions, node de- 
gree is generally not correlated with neighbours. Chained 
nodes with high betweenness centrality can lead to low 
expected reach. Expected cluster degree is also distict, 
though not categorically, from path counting in that 
it directly measures the expected number of infected- 
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